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2.4 Lower bounds on mean squared errors: information inequalities. When we 
consider a parametrized family V ={Pe, 9 G 6} of laws, we will always assume that 
P e ± P^ for 6 ± (j). 

If T = T(Xi, ...,X n ) is a statistic and Xi, . . . ,X n are i.i.d. (Pq) then (as before) we 

let 




. . . ,x n )dP e (x 1 ) ■ ■■dP e (x n ), 



or if T is a function on the basic sample space (n = 1) then EqT = JT(x)dPo(x). Corre- 
spondingly, variances and covariances of real-valued statistics T and U are defined by 

var 9 T := E e (T 2 ) - (E e T) 2 : cov e (T,U) := E (TU) - (E e T)(E e U) 

when the integrals converge, with vargT := +oo if Eg(T 2 ) = +00. For squared-error loss 
L(6,T) = (T — g(9)) 2 , if T is an unbiased estimator of g(9), then the mean squared-error 
loss equals var#T. On the other hand, trivial constant estimators as mentioned in the last 
section will have variance without being good estimators except for special parameter 
values. Inequalities of the type in this section were first found for unbiased estimators, but 
there will be a form (Theorem 2.4.12) which applies to estimators that may have a bias. 

We are looking for lower bounds for variances of unbiased estimators T of functions 
g(0). Suppose first that T is an unbiased estimator of 9. Then for any constants a and 6, 
a+bT is an unbiased estimator of h{9) = a+b9, with varg (a+bT) = 6 2 var#T. This variance 
doesn't depend on a and is proportional to b 2 where b is the derivative of h (everywhere, 
in this simple case). Or more generally, if T is an unbiased estimator of g(9) then a + bT is 
an unbiased estimator of a + bg(9) and varg(a + bT) = 6 2 var^T. Thus it seems natural that 
(lower) bounds for the variances of unbiased estimators should be proportional to g'{9) 2 , 
as they will be. 

Also, recall that for n i.i.d. observations, the sample mean X as an estimator for an 
unknown mean fx is unbiased and has a variance equal to a 2 jn where a 2 is the variance of 
one observation. So we can anticipate that lower bounds for the variance of an unbiased 
estimator of g(9) based on n i.i.d. observations should be of the form u(9)g' (9) 2 /n for 
some function u{9). This will also turn out to be true (Theorem 2.4.10), so we have to find 
suitable functions u{9), which are most often written as 1/1(9) where /(•) is the so-called 
Fisher information, to be defined. 

A family V of probability measures will be called equivalent if any two laws P and Q 
in the family are equivalent, in other words for any measurable set B, P(P>) = if and 
only if Q(P>) = 0. Then {Pq, 9 G 0} will be equivalent if for some cr-finite measure v, Pq 
is equivalent to v for all 9 G O. Conversely, if V is equivalent, we can take any member of 
V as v. Let the density (Radon-Nikodym derivative) be f(9,x) := (dPg / 'dv)(x). Then 
f(6,x) > for w-almost all x and for P^-almost all x for each <p G O. The likelihood ratio 
R^^ := Rp 4> /p g = f((f),x)/f(9,x) will be defined, with < R^j < 00 for almost all 
x in the same sense; 0/0 will be defined as in this case. Here is a first lower bound on 
variances of unbiased estimators. Note that in it, there is no restriction on the parameter 
space O, which could be an arbitrary set. 
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2.4.1 Theorem. Suppose T is an unbiased estimator of a real function g(9) for an 
equivalent family {Pq, 9 G 6}. Then 

var.T > sup{(^)-(7(^)) 2 /var^ )e : </> G 6, ^ 9}. 

Note. The conclusion of the theorem holds trivially if var^T = +oo or if v&tqR^j = +oo 
for all (j) 9. So the theorem has content if and only if both Eg(T 2 ) < oo and var^R^^ < oo 
for at least one value of (p ^ 9. For 4> ^ 9, since Pq ^ P<j>, R^^g is non-constant with respect 
to Pq, so its variance is non-zero. 

Proof. Since T is unbiased, / T(x)f(<f>,x)dv(x) = g((f>) for all 0, and 
jT(x) mX ^- f x ^ x) mx)dv(x) = 9(^-9(9), 

cov e (T,R^) = j{T{x)-g{9))^j^-ljdP e {x) 

= g(cf>)-2g(9) + g(e) = g(cf>) - g{6). 
Then by the Cauchy-Bunyakovsky-Schwarz inequality (RAP, 5.1.4), 

var e T > {g{4>) - g{9)) 2 /v^eRw, 

where yqxqR^^q > for 9 ^ 0; then take the supremum over 4> ^ 9. □ 

In the rest of this section, O is an open interval in R. Often, the function g{9) to be 
estimated is just 9. Then g has the derivative g' = 1, so that all the further facts in this 
section in terms of g'(9) simplify. 

2.4.2 Theorem. Assume that T is an unbiased estimator of g(9) for an equivalent family 
{P9, 9 G O}, © is an open interval in K, g has a derivative at 9 and as <p ~^ f° r some 
J (6), (var e R<t>,e)/{<l> - 0) 2 -> J{9). Then if g'{9) ^ or J{9) > 0, 

var e T > g'(9) 2 /J(9). 

Proof. In Theorem 2.4.1, divide numerator and denominator by (0 — 9) 2 and let (j) — > 6*. 
If J(6>) = 0, vargiT must be +00, so the conclusion follows. □ 

Note that for any <p ^ 9, var R^ e /((p - 9) 2 = E e {{{R^ e - l)/(0 - 9)) 2 ) and 
= 1- Suppose that in J(9), the limit as <p — > can be interchanged with the integral 
.Eg, so that the integrands converge. Their limit is then the square of a partial derivative, 
(dR^^e / defying) 2 , which can also be written as 

f df(9,x)/d9 \ 2 f d\ogf(9,x) \ 2 
V W,x) J \ 89 ) ■ 
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The quantity d\ogf(0,x)/dO is known as the score function. If the derivatives in the last 
display exist for almost all x, then the quantity 



1(9) := E e ((dlogf(9,x)/d9) 2 ) = J(df(9,x)/d9) 2 /f(9,x)dv(x) 

is called the information of the family {Pe} at 9. It is by no means the same as the 
"information" studied in information theory. 1(9) is often called the Fisher information. 
Fisher made good use of it, but it was originally due to Edgeworth, see the Notes. 

A famous inequality, var#T > g' (9) 2 / 1(9), then follows from the interchange of limits. 
One set of sufficient conditions for the interchange will imply that the identities 

1 = J f(9,x)dv(x) and g(9) = J T(x)f(9,x)dv(x) 

can be differentiated with respect to 9 under the integral sign, as follows: 

2.4.3 Information inequality. Let T be an unbiased estimator of a function g(-) on an 
open interval O for an equivalent family {Pg, 9 G O}. For a given value of 9, assume that 
d f(9 ,x) I '39 exists for almost all x, 1(9) > and that 



(2.4.4) J(\T(x)\+l) 



df(9,x) 



89 



dv(x) < oo, 



(2.4.5) = J 9f ^ e X) dv(x), and 

(2.4.6) g'(9) = J T(x)^fl dv(x). 



Then 



var e T > g'(9) 2 /I(9). 



Notes. The information inequality has been called the Cramer-Rao inequality, but Frechet 
found it earlier and Darmois also played a part (see the Notes). Existence and finiteness 
of the Lebesgue integrals in both (2.4.5) and (2.4.6) is equivalent to (2.4.4). 

Proof. Multiplying (2.4.5) by g(9) and subtracting from (2.4.6) gives 

g'(9) = Eq((T(x) — g(9))d\ogf(9 1 x)/d9). 

If var^T = +oo or 1(9) = +oo, the inequality holds trivially since g'(9) is finite. If var^T 
and 1(9) are both finite, then the Cauchy-Bunyakovsky-Schwarz inequality can be applied 
as in the proof of Theorem 2.4.1 to get g'(9) 2 < I(9)var 9 T. □ 
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Most books state the information inequality under assumptions such as those of The- 
orem 2.4.3. Exchanging differentiation with an integral as in (2.4.5) and (2.4.6) may seem 
a plausible and reasonable kind of hypothesis. But an example will be given below (Propo- 
sition 2.4.13) showing that assumption (2.4.6) may not hold even when (2.4.4) does, and 
where each derivative and integral in (2.4.6) is well-defined and finite. So let's see how 
(2.4.4) can be strengthened enough to imply (2.4.5) and (2.4.6), by way of the notion of 
uniform integrability (RAP, Section 10.3). A set JF of integrable functions on a probability 
space (X, iS, fx) is uniformly integrable iff 

lim sup{E\f\l m>M} : feJ 7 } = 0. 
This will hold if (but not only if) there is an integrable function g with |/| < g for all 

2.4.7 Theorem. Assume that for a given 6>, df(9 1 x)/d9 exists for almost all x and there 
is a 5 > such that the functions 

(\T(x)\ + l)(f(cf> 1 x)-f(9,x))/( ( f>-9) for o<\<p-e\<5 

are uniformly integrable for v, or equivalently the functions 

(|T(x)| + l)(i^ ifl -l)/(0-0) for 0<\(f>-e\<8 

are uniformly integrable with respect to Pq. Then (2.4.4), (2.4.5) and (2.4.6) all hold. 

Proof. The conditions follow from convergence of integrals of pointwise convergent, uni- 
formly integrable functions (RAP, Theorem 10.3.6). □ 

Theorems 2.4.3 and 2.4.7 have been stated for one unbiased estimator T, but the in- 
formation inequality has usually been stated as applying to all unbiased estimators, with 
hypotheses (2.4.4) and (2.4.6) assumed for all such estimators of g{6). If attention is re- 
stricted to estimators measurable for a Lehmann-Scheffe sufficient a-algebra, the unbiased 
estimator (if it exists) is unique by Theorem 2.3.5 and the results of this section are not 
needed to choose between unbiased estimators (although Theorem 2.4.12 could be helpful 
for other estimators). Otherwise, there can be a large family of different unbiased estima- 
tors of g{9). So it may not really be clear what it means, in terms of the family of laws 
P<9, that (2.4.4) and (2.4.6) hold for all unbiased estimators of g. An alternate sufficient 
condition will be stated just in terms of g and the family {Pq, 9 G 6}: 

2.4.8 Theorem. The information inequality holds for a given 9 for every unbiased esti- 
mator T of g(-), if: © is an open interval in R, {Pg, 9 G 0} is an equivalent family, g has 
a non-zero derivative at 9, df(9,x)/d9 exists for almost all x, and there is a 6 > such 
that the set of functions ((R^e ~ l)/(0 - #)) 2 for < \<f> - 9\ < 6 is uniformly integrable 
for P e . 

Proof. Applying Theorem 2.4.2, we have 

J{9) = limfvar^)/^-^) 2 = lim tE e {[{R^e -!)/(</>- 0)?) 
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= E e ((df(e,x)/de) 2 /f(e,x) 2 ) = i(9), 



again by convergence of integrals of pointwise convergent, uniformly integrable functions 
(RAP, Theorem 10.3.6) and the assumptions. □ 

If x = (x\, . . . , x n ) for i.i.d. Xi, then 1(9) is n times the information for one observation: 

2.4.9 Theorem. Suppose that x = (x±, . . . , x n ) where Xi are i.i.d. with distribution having 
density fi(9,x\) with respect to v, so that dPg/dv n = f(9,x) = Ui<j< n fi(9,Xj). Also 
assume the hypotheses on / and R^g in Theorem 2.4.8 hold for f\ and fi((f>, •)/ fi(9, ■) 
respectively. Then 1(9) = nh(9) where h(9) := Eg((d\ogf 1 (9,x 1 )/d9) 2 ). 

Proof. We have log f(9,x) = X^=i ^°&fi(Qi x j)i an d 

1(9) = nE e ((dlogf 1 (9,x 1 )/d9) 2 ) + n(n-l)(E e dlogf 1 (9,x 1 )/d9) 2 . 
Now since uniform square-integrability implies uniform integrability, 

E 9 dlogf 1 (9,y)/d9 = J Km(f 1 (<p,y)-f 1 (9,y))((<j ) -9)f 1 (9,y))- 1 f 1 (9,y)dv(y) 

= hm^-9)- 1 J f 1 (<P,y)-f 1 (9,y)dv(y) = 0. 

Thus 1(9) = nh(9). □ 

For Theorem 2.4.9 to be useful, it will be helpful to know that the information in- 
equality holds for the case of n i.i.d. observations under hypotheses on the densities fi(9, x) 
of individual variables. One such fact is as follows: 

2.4.10 Theorem. Under the conditions of Theorem 2.4.9, if T = T(x±, . . . ,x n ) is an 
unbiased estimator of g(9) and g'(9) ^ exists, then 

var e T > g'(9f/(nh(9)). 



Proof. The uniform integrability condition in Theorem 2.4.8 extends to more than one 
variable as follows. First, for n = 2, 

(2.4.11) R4>,e( x i)R<t>A x 2) - 1 = R*A x i)(R*A x 2) - !) + (R*A x i) - !)• 

To show that a class of functions of the form (/ + g) 2 is uniformly integrable for / in a 
class T and g in a class noting that (/ + g) 2 < 2f 2 + 2g 2 , it is enough to show that the 
sets of functions f 2 and g 2 are uniformly integrable. Dividing each term on the right of 
(2.4.11) by 4> — 9 and squaring, the latter term is uniformly integrable for < |0 — 9\ < 5 by 
assumption. For the former term, using independence of X\ and X2, it will be enough to 
show that the R ( / ) ^(Xi) 2 are uniformly integrable, or equivalently that (R ( f ) ^(Xi) — l) 2 are. 
This is clear on multiplying and dividing by (<p — 9) 2 , which is less than 5 2 . The uniform 
integrability in Theorem 2.4.8 then extends to n > 2 by induction, so the information 
inequality holds and the form of 1(9) is given by Theorem 2.4.9. □ 
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Note. If the hypotheses of Theorems 2.4.3 and 2.4.7 hold for unbiased estimators T(x), 
which can be viewed as T(X\), then they do not necessarily follow for unbiased estimators 
T(X\,... ,X n ). In fact, often the set of functions g(9) that have unbiased estimators 
depends on n, e.g. for binomial distributions, Section 2.2, Problem 5. So to apply these 
theorems to n > 1 we would need to check their hypotheses for x = (x±, . . . ,x n ) rather 
than only for n = 1. 

Examples. (1) Let fi{9,x) be the normal N(9,l) density and g{9) = 9. Then by The- 
orem 2.4.10, var#T > 1/n for any unbiased estimator T(X±, . . . ,X n ) of 9. This variance 
is attained by T := X, for any so I is a "uniformly minimum- variance unbiased 
estimator." 

(2) Let v be counting measure on the nonnegative integers and let Pg be the Poisson 
law with parameter 9, P (j) = e~ e 9 3 j = 0,1,... . Let g{9) = 9. Then h = 
Eff((j9~ 1 — l) 2 ) = 1/9 and Theorem 2.4.10 gives var#T > 9/n for any unbiased estimator 
T. Again, this minimum variance is attained by the unbiased estimator X for all 9. 

(3) The information inequality lower bound cannot always be attained. For normal mea- 
sures N(p>,a 2 ) with n E K and a > 0, s 2 := X)ILi0^i ~~ X) 2 /(n — 1) is an unbiased 
estimator of a 2 with variance 2a 4 /(n — 1) while h(cr 2 ) = l/(2cx 4 ), so the lower bound 
given by Theorem 2.4.10 is 2cr 4 /n. It will be shown in the next section that 2cr 4 /(n — 1) 
is the smallest variance actually attainable. 

The requirement that an estimator is unbiased can be restrictive, and as we saw in Sec. 
2.2, can force a bad choice of estimator. The inequalities proved earlier in this section can 
be adapted to give bounds for mean-square errors for more general estimators as follows. 

Let T be a statistic used as an estimator of a function g{9). Let b(9) := EqT — g(9) 
for all 9. Then b{9) is called the bias at 9 and is for all 9 if and only if T is an unbiased 
estimator of g. In general, as long as Eg\T\ < oo for all 9, T will always be an unbiased 
estimator of (g + b)(9), and so: 

2.4.12 Theorem. If sufficient conditions for the information inequality hold for g + b in 
place of g, then for all 9, 

Eb((T — g(9)) 2 ) > {9+ I b ^ 9) \ b{9f. 

Proof. For any random variable Y with mean fj, and any c, we have E((Y — c) 2 ) = 
var(y) + (c — /u) 2 , so the Theorem follows. □ 

The hypotheses of Theorem 2.4.2 can be weakened as follows. Let 
J-{9) := liminf y ^(var eJ R y)9 )/(y - 9) 2 and 
S{9) := \imsup y ^ e \g(y) -g(9)\/\y-9\. 
Then if either J_ or S is a limit as well as a lim inf or lim sup respectively, and at least 
one is not zero, it will follow that var e T > S 2 (9)/J-(9). In particular, if g'(9) ^ exists, 
thenvar e T > g' {9f / J_{9). 

The information 1(9) equals J-(9) if in the definition of J-(9), the lim inf is a limit 
J (9) and the limit can be interchanged with the integral sign, as it can be under conditions 
treated above. 
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If df(9, x)/d9 exists for ^-almost all x, then 1(9) is defined (possibly +00) and 1(9) < 
J -(9) by Fatou's Lemma (RAP, 4.3.3). Here var e T > g'(9) 2 /I(9) may not hold without 
further hypotheses: 

2.4.13 Proposition. There exist densities f(9, x) with respect to Lebesgue measure on R 
defined for — 1 < 9 < 1 such that /(•, •) is jointly C°° (infinitely differentiable) in both its 
variables, with f(9,x) > and df(9,x)/d9\e=o = for all x, 1(0) = 0, and J(0) = +00. 
Also, x is an unbiased estimator of 9, Eqx = 9, and var x = 1. Thus the information 
inequality var x > 1/1(0) fails. 

Proof. Let / be a nonnegative C°° function which is even (f(x) = f(—x)) and has 
compact support and f(x)dx = 1, such as, for the suitable normalizing constant c, 

f c-exp(-(l -x)~ 2 - (1 + x)~ 2 ), for -1 < x < 1 
f(x) = < 

{ 0, otherwise. 
Then xf(x)dx = 0. Let h be the standard normal N(0, 1) density. Let 

(1 - 9 2 )h(x) + 9 2 f(x - 9- 1 ), for < \9\ < 1 



f(9,x) = 



h(x), for = 0. 



Then f(9,x) > for all x and |6>| < 1. Since h and / both have mean 0, the mean 
J^oo xf(9 7 x)dx is for 9 = and 9 2 9~ 1 = 9 otherwise, so x is an unbiased estimator of 
9. For x in any bounded interval, f(9,x) = (1 — 9 2 )h(x) for 9 in a neighborhood of 0, 
specifically, for |x| < M and \9\ < 1/(M + 1). Since f(9,x) is clearly C°° in x and 6* for 
outside a neighborhood of 0, it is in fact jointly C°° for all x and for — 1 < 9 < 1, with 
a/(0,x)/a0| e=o = for all x. So 1(0) = 0. For y ^ 0, 

/oo 
Ky^ff^.x^dx - 1 
-00 

/oo 
/(x-y- 1 ) 2 ^- 1 ^ 
-00 

/oo 
/(x-y- 1 ) 2 exp(x 2 /2)(27r) 1 / 2 ^ 
-00 



= y 4 



The latter integral goes to +00 as y — > 0, as exp(y 2 /2) or faster, so J(0) = +00 > 1(0) = 
0. The rest follows. □ 

So, existence of integrals involving df(9,x)/d9 does not guarantee that limits can be 
interchanged with integrals, and the uniform integrability conditions in Theorems 2.4.7 
and 2.4.8 can't simply be removed. In the example in the last proof, letting r 2 be the 
variance of the law with density /, 

var e x = E e (x 2 ) - 9 2 = (1 - 9 2 ) ■ 1 + 9 2 (9~ 2 + r 2 ) - 9 2 = 2 - (2 - r 2 )9 2 
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for 9 7^ and 1 for 9 = 0. So the variance of a; is discontinuous at 9 = 0. 

Suppose we have another parametrization of a family {Pg, 9 G 0} where = Pe{^) 
and that we want to estimate g(9) = g{9{ r (p)). Then we have 

2.4.14 Theorem. If ip \— > is differentiable with a non-zero derivative then the 
information inequality lower bound for var T is the same for the parametrization by if) as 
for the parametrization by 9. 

Proof. In the change from parameter 9 to parameter if; in the information inequality, 
by the chain rule, both numerator and denominator are multiplied by 9'(i(j) 2 > 0, not 
changing the bound. □ 

The information inequality is most useful in cases where there exists some unbiased 
estimator T whose variance attains the lower bound given in 2.4.3 for all 9. It turns out 
that under some regularity conditions (stronger than those needed for the information 
inequality itself), the bound is attained only for densities of a certain "exponential" form, 
which will be studied more generally and in more detail in the next section. Recall that a 
function is called C 1 if it is everywhere differentiable with a continuous derivative. 

2.4.15 Theorem. Assume the hypotheses of Theorem 2.4.3 for all 9 in an open interval 
© and that < var^T < oo for all 9 and 9 log /(6>, x)/d9 is continuous in 9 for almost all 
x. Then the information inequality becomes an equation for all 9 if and only if there exist 
C 1 functions c(-) and d(-) of 9 and a measurable function h of x such that for all 9 and 
almost all x, 

f{9,x) = c(9)h(x)exp(d(9)T(x)). 

Proof. By the assumptions, I{9)yqxqT > and g is everywhere differentiable on the open 
interval 0, so it is continuous. The proof of Theorem 2.4.3 gives g'(9) 2 < {\axeT)I(9), 
which must become an equation since the information inequality does. So, for each 9, 
the functions T — g{9) and dlog f / 89 must be proportional (in the proof of RAP, 5.3.3, 
b 2 — 4ac = implies ||/ + tg\\ 2 = for some t). So for each 9, there is an a{9) such that 
dlog f (9, x)/ 89 = a(9)(T(x) — g{9)) for almost all x. Since vargT > there is a set of x 
of positive measure on which T(x) ^ 9 (9), so a (9) is uniquely determined. For the same 
reason, there must exist some number c such that T(x) > c and T(x') < c for x and x' in 
sets A, B of positive measure respectively, where also for y = x or x', dlog f(0,y)/d$ is 
continuous in 9. Thus 9 log f(9, x)/d9 — dlog f(9, x')/d9 is continuous in 9 for any such 
x,x'. For any given 9, the difference equals a(9)[T(x) — T(x')\ for almost all x G A and 
x' G B. Taking any convergent sequence 9j — > #o of values of 9, we have the equality for 
almost all x G A and x' G B for all 9j, j > 0. Thus a(9k) — > a(9o). A real-valued function 
of a real variable, continuous along any sequence, is continuous, so a(-) is continuous. We 
can then take an indefinite integral to get logf(9,x) = d(9)T(x) + u{x) — j(9) for some 
measurable function u(x) and C 1 functions d(9) and j(9). Taking the exponential of both 
sides finishes the proof in one direction. Conversely, when functions are proportional, the 
Cauchy-Bunyakovsky-Schwarz inequality always becomes an equation. □ 

On the regularity conditions needed for Theorem 2.4.15, see the Notes. 
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PROBLEMS 



1. Consider the family of exponential distributions with densities f c (x) = e~ x ^ c /c for x > 
and for x < 0, where < c < oo. For n observations, show that X is an unbiased 
estimator of c with variance c 2 /n and that this attains the minimum possible variance 
given by the information inequality. 

2. Suppose that the exponential distributions are parametrized by A = 1/c instead of c, 
so that the densities are h\(x) = Xe~ Xx for x > 0, and that we want to estimate A, or 
in other words we want to estimate the function g(c) = 1/c for the parametrization in 
Problem 1. Show that for n = 1 there is no unbiased estimator of A, while for n > 1, 
(n — l)/(nX) is such an estimator. Compare its variance to the information inequality 
lower bound. Hint: if Xi, . . . , X n are i.i.d. with standard exponential density fi = hi, 
then Xi + ■ ■ • X n has a gamma density x n ~ 1 e~ x / (n — 1)! for x > and for x < 0. 
Also, cXi are i.i.d. with density f c . 

3. For a fixed h > consider the family Uh of all uniform distributions on [9,9 + h] for 
9 G R, as in Problem 3 of Sec. 2.3. Show that for estimating 9 with squared-error loss 
(T — #) 2 , there exists an unbiased estimator with variance of the order 1/n 2 as the 
sample size n — > oo, which is smaller than can happen when the information inequality 
applies. Explain why the information inequality fails to apply in this case. Hint: for 
9 = and h = 1, show that the probability that X^ < x is 1 — (1 — x) n and then that 
EX {1) =l/(n + l). 

4. Evaluate the information 1(9) in the following cases. 

(a) Binomial distribution B(n,p) for n fixed, < p < 1. 

(b) geometric distribution P{k) = (1 — p) k ~ 1 p for k = 1, 2, . . . . 

5. Evaluate the information of N(0, a 2 ), taking the parameter 9 to be (a) a, (b) a 2 . 

NOTES 

Theorem 2.4.1 is due to Hammersley (1950) and was rediscovered by Chapman and 
Robbins (1951). The notion of information 1(9) originated with Edgeworth (1908,1909) 
and was developed by Fisher (1922 and later papers), see Savage (1976). 

The information inequality (2.4.3), var^T > g'(9) 2 /I(9), was first found by Frechet 
(1943) and extended by Darmois (1945). It was rediscovered by C. R. Rao (1945) and 
Cramer (1946a; 1946b, pp. 475-476) and has been widely known as the "Cramer-Rao" 
inequality. In view of the contributions of Frechet and Darmois, L. J. Savage (1954) 
proposed the name "information inequality." Rao (1945, equ. (3.2)) did not actually state 
regularity conditions adequate to justify his interchange of limits. Cramer did, but in a 
special case where not only g(9) = 9 but T(x) = x. 

Joshi (1976) gives an example of a location family, so that f(9, x) = f(x — 9), and an 
estimator for which the information inequality becomes an equation for all 9, —oo<9< 
oo, but which does not have the exponential form given in Theorem 2.4.15. The given /(•) 
is not continuous, having some jumps, so for no x is /(•, x) everywhere differentiable with 
respect to 9, and the hypothesis of 2.4.15 fails although for each x, the density is smooth 
with respect to 9 except for a few jumps. See Joshi (1976) for details. 

The notes for this section are largely based on those in Lehmann (1983, p. 145). 
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